Background.

We demonstrate how to import and export gated cytometry data into R/Bioconductor using the CytoML R/Bioconductor package. CytoML fits into a important niche in the data analysis pipeline by enabling reproducible analysis and data sharing of manually and computationally gated cytometry data across several different vendor platforms.

BD’s DIVA software

Beckton Dickinson’s (BD’s) DIVA software is used as the acquisition software for all BD instruments. It is often used to define gates for sorting cell populations and also for data analysis. DIVA’s analyses are stored as XML files. No special conversion needs to be done to read these using CytoML.

DIVA XML organizes the different gating groups into different ‘specimen’ nodes under an ‘experiment’ node. Each ‘specimen’ group contains multiple ‘tube’ nodes, which represent FCS samples. All gate definitions are directly stored under ‘tube’ node. The gating hierarchy can be derived from the ‘parent’ attribute of each gate and each gate is uniquely labeled with the full gating path (i.e. ‘A\B\C’).

Gate definitions are not Gating-ML compliant. Currently CytoML supports RECTANGLE_REGION, POLYGON_REGION, INTERVAL_REGION gates. DIVA supports log or biexp transformations. The transformation is determined by the attributes under the ‘tube/instrument_settings’ node. The global transformation parameters can be overridden by gate-level attributes.

FlowJo workspaces.

FlowJo (Ashland, OR) is a graphical data analysis platform used primarily for manual gating of cytometry data. Older versions of FlowJo (< v. 10) stored workspaces in binary format (so-called “.jo” files) that could be exported to XML from within FlowJo. Newer versions of FlowJo (>= v. 10) store workspaces in XML format natively (“.wsp” files). The “.wsp” files can work natively with the CytoML package.

FlowJo XML (latest v 10) stores the gates, transformation and compensation information at the sample level (‘Sample’ node). The gating hierarchy is naturally captured in the nested ‘subPopulation’ xml nodes. The gate definitions are partially Gating-ML 2.0 compliant. Currently the gates supported by CytoML are NotNode, OrNode, AndNode, PolygonGate, RectangleGate (a QuadGate is typically stored as four RectangleGate objects), EllipsoidGate, and CurlyQuad.

Cytobank workspace files.

Cytobank encodes analyses in XML format as well. No special conversion needs to be done to read Cytobank XML files.

Cytobank XML is the most Gating-ML 2.0 standard-conforming format (http://isac-net.org/Resources/Standards.aspx). All the gate objects as well as the transformation and compensation nodes are stored in Gating-ML 2.0 format and thus can be directly parsed into R through the flowUtils package. However, these objects are organized in a flat structure. That means every gate is stored at the root level of the XML document. Gates are also replicated across samples if the gate coordinates are not identical for some samples. In these cases, the sample/FCS information is attached to the gate node through the ‘custom_info’ property. Consequently, the gating hierarchy needs to be reconstructed from the what Cytobank calls a ‘GateSet’ node. These ‘GateSet’ nodes are composed of ‘BooleanGate’ nodes that use the ‘AND’ operator to organize the sequence of gates that need to be applied to identify a specific cell population. The ‘QuadrantGate’ is treated differently since each ‘quadrant’ essentially represents a separate gate and thus is not parsed through a ‘GateSet’. Besides the standard Gating-ML contents, each Gating-ML node also carries the extra ‘custom_info’ which stores some critical information (e.g. gate name, gate definition in json format) that’s required for the Cytobank parser work with its own XML. Both the transformation and compensation properties are global (e.g., apply to all samples). The counts of cells in each cell population are not stored in the Cytobank XML and need to be imported as a separate csv, or they can be recomputed from the raw data once imported into R.

A unifying data representation.

Despite the differences between how each platform represents a gated data analysis, once imported, the data are represented using a common set of underlying data structures, implemented in the flowWorkspace package. Individual samples are mapped to GatingHierarchy objects, and collections of them with a common gating scheme are combined into GatingSet objects. A GatingSet/GatingHierarchy stores along with it the gates, transformations, compensation matrices, and sample meta-data necessary to reproduce an analysis from raw FCS files. All the same information that is stored in the platform-specific XML files. Once these data are in a common representation it CytoML allows us to transform data between platforms.

Share computational gating results.

Because of the common data representation, data that are gated using computational tools can be represented in the same manner and can be exported to the different platforms for visualization. This enables computational analyses to be shared with, and the data to be inspected by, users of these other platforms. Additionally, it allows the flexibility of mixing manual with computational analysis. Users can, for example, import manually gated data and perform computational gating or visualization of a subset of cells within the data set very easily using the flowWorskpace and ggcyto packages (Lin, Frelinger, et al. (2015), Finak and Jiang (2011b), Van et al. (2018)).

Examples

These examples require the flowWorkspaceData package as well as the CytoML package and its dependencies (installation instructions follow). All cytometry figures (except the plots from FlowJo and Cytobank) are generated using the ggcyto package (Van et al. (2018)). Automated gating is performed using the openCyto package (Finak et al. (2014)), as well as FlowSOM (Van Gassen et al. (2015)).

Installation

These packages and dependencies can be installed as below. Furthermore, it is recommended that this workflow is executed using the current release of R (v3.5.0 at time of writing).

# A helper function to install from various sources.
install_conditional <- function(pkg, source = c("Bioc", "CRAN", "RGLab")) {
  source <- match.arg(source, c("Bioc", "CRAN", "RGLab"))
  utils::chooseCRANmirror(ind = 64)
  for (p in pkg) {
    if (source == "CRAN") {
      if (is.na(match(p, installed.packages()[, "Package"]))) {
        install.packages(p, ask = FALSE)
      } else {
        update.packages(p)
      }
    } else if (source == "Bioc") {
        BiocManager::install(p, ask = FALSE, update = FALSE, type = "source")
    
    } else {
      devtools::install_github(paste("RGLab", p, sep = "/"), ref = "trunk")
    }
  }
}

# install from CRAN  
install_conditional(
  c(
  "MLmetrics",
  "webshot",
  "Gmisc",
  "rsvg",
  "DiagrammeR",
  "DiagrammeRsvg",
  "BiocManager",
  "devtools",
  "cowplot",
  "png",
  "grid",
  "gridExtra",
  "gtable"
  ),
  source = "CRAN"
  )
  # install specific releases from  RGLab github
devtools::install_github("RGLab/RProtoBufLib@v1.3.7")
devtools::install_github("RGLab/cytolib@v1.3.2")
devtools::install_github("RGLab/flowCore@v1.47.7")
devtools::install_github("RGLab/flowWorkspace@v3.29.7")
devtools::install_github("RGLab/openCyto@v1.19.2")
devtools::install_github("RGLab/CytoML@v1.7.10")
devtools::install_github("RGLab/ggcyto@v1.9.12")
# install from Bioc
install_conditional(c("FlowSOM", "flowWorkspaceData"), source = "Bioc")

#Load required libraries
library(flowWorkspace)
library(png)
library(webshot)
library(openCyto)
library(rsvg)
library(grid)
library(MLmetrics)
library(DiagrammeR)
library(DiagrammeRsvg)
library(Gmisc)
library(gridExtra)
library(gtable)
library(cowplot)
library(ggcyto)
library(FlowSOM)
library(CytoML)

#you may need to run this if it's not installed.
# webshot::install_phantomjs()
# A helper function to split clustering from FlowSOM into lists of
# cell memberships that can be added to a GatingSet.
cluster_split_by_flowSet <- function(clusters,fs) {
  x <- c(1,cumsum(fsApply(fs, nrow)))
  out <- NULL
  for (i in 1:(length(x) - 1)) {
  out <- rbind(out, c(ifelse(x[i] == 1, x[i], x[i] + 1), x[i + 1]))
  }
  split_indices <- do.call(c, sapply(1:nrow(out), function(i) {
    l <- length(out[i, 1]:out[i, 2])
    rep(i, l)
  }))
  split(clusters, split_indices)
}

The CytoML Workflow

The role of CytoML in a workflow is shown below. CytoML is used to import platform-specific workspace files into GatingSet and GatingHierarchy representations in R. These representations are implemented in the flowWorkspace package, on which CytoML relies. The R representations can be augmented with computational gating approaches to perform additional data-driven gating in two or higher dimensions (i.e., clustering). CytoML can export the R representation back to FlowJo or Cytobank platform-specific representations.

CytoML uses the compensation matrices and data transformations specified in the Gating-ML output by each platform to reproduce the data analysis in R. Consequently, cell-level information is available to the user. The membership and marker expression of each cell in each population and gate is available, and can be augmented with computational gating approaches, or can be extracted for complex downstream statistical modeling. The latter strategy has been leveraged to model T cell polyfunctionality in HIV vaccine and TB disease studies and others (Lin, Finak, et al. (2015), Rakshit et al. (2017), Alberer et al. (2017)).

The diagram below is generated using the DiagrammeR package and is defined in mermaid format.

workflow <- DiagrammeR("graph TB;
  A(Diva) ---|<b>read/write<br> gates, compensation<br>& transformation</b>| D[<center>Platform-Specific<br>Workspace</center>];
  B(FlowJo) ---|<b>read/write<br> gates, compensation<br>& transformation</b>| D;
  C(Cytobank) ---|<b>read/write<br> gates, compensation<br>& transformation</b>| D;
  D -->|<b>Import<br>gates<br>compensation<br>& transformation</b>| E(<center>CytoML</center>);
  E -->|<b>Reproduce<br>compensation,<br>transformation & <br>gating on raw FCS files</b>| F[<center>R GatingSet and GatingHierarchy<br>leveraging flowWorkspace</center>];
  O(Computational Gating<br> in R) --> |<b>Augment gating tree</b>| F
  F -->|<b>Convert gates,<br>transforms,<br>compensation<br>to Gating-ML</b>| E(<center>Cyto-ML</center>);
  E -->|<b>Export<br>FlowJo and<br>Cytobank</b>| D;

style E fill: #aede9c
")
workflow

Supplementary Figure 1: A schematic representation outlining the CytoML workflow shows how gates, compensation, data transformations are converted to / from platform-specific representations to an R representation in order to reproduce an analysis, combine a manual analysis with computational gating, and export the analysis back to specific platforms.

Importing and Visualizing FlowJo workspaces.

FlowJo import is demonstrated on a pair of samples stained using a Lyoplate T-cell panel. The data are from a published FlowCAP IV study@Finak2016-dc, where triplicate samples were distributed to nine centers for staining, analysis, and gating using a variety of Lyoplate panels. Data were gated manually at each center, manually by a common analyst, and using computational methods, then the contribution to overall variability of gating and staining was assessed. The complete data are available from ImmuneSpace from this link (free ImmuneSpace sign up and login required) (Sauteraud et al. (2016), Brusic et al. (2014)).. The FCS files used here for demonstration are distributed with the flowWorkspaceData package (Finak and Jiang (2011a)) in BioConductor.

We find the path to the manual gating XML FlowJo file that’s distributed with the flowWorkspaceData package.

manual_gating_xml = 
  system.file("extdata", 
              package = "flowWorkspaceData", 
              "manual.xml")

We need to do two things to import the data. We need to open the workspace, then we need to choose a group of files to import. We see that this workspace has six sample groups, five of which correspond to different staining panels (for B cells, DCs, T cells, T helper cells, and T regulatory cells). The sixth is a group of “All Samples” that is almost always present in FlowJo workspaces, and should rarely be imported into R, since it would construct a GatingSet of samples with inconsistent (i.e. different) gating trees.

ws = openWorkspace(manual_gating_xml)
print(ws)
## FlowJo Workspace Version  2.0 
## File location:  /usr/local/R-3.6.0/library/flowWorkspaceData/extdata 
## File name:  manual.xml 
## Workspace is open. 
## 
## Groups in Workspace
##          Name Num.Samples
## 1 All Samples          45
## 2      B-cell           4
## 3          DC           4
## 4      T-cell           4
## 5     Thelper           4
## 6        Treg           4

We will import the T-cell group. We need to point the parser at the right location where the FCS files are located.

path = system.file("extdata",package="flowWorkspaceData")
gs_tcell = parseWorkspace(ws, name = "T-cell", path = path)
## mac version of flowJo workspace recognized.
gs_tcell
## A GatingSet with 2 samples

Two of the files are available in the data package. We can plot the gating tree from this analysis, as well as produce dot plots of individual gated populations using Bioconductor’s tools.

plot(gs_tcell)
Figure 1: Gating tree of the Cytotrol T-cell data imported from a FlowJo workspace using CytoML and flowWorkspace.

Figure 1: Gating tree of the Cytotrol T-cell data imported from a FlowJo workspace using CytoML and flowWorkspace.

The gating tree shows gating of different memory and activated CD4 and CD8 T cell populations.

p = autoplot(gs_tcell[[1]], bin = 128)
Figure 2: The gating hierarchy for a Cytotrol T cell sample imported from a FlowJo gating hierarchy.

Figure 2: The gating hierarchy for a Cytotrol T cell sample imported from a FlowJo gating hierarchy.

Adding a lymphocyte gate using openCyto.

We’ll use openCyto (Finak et al. (2014)) to add a two-dimensional lymphocyte gate on the FSC-A, SSC-A channels beneath the singlet gate.

#Plot the FSC,SSC dimensions
ggcyto(gs_tcell,
       subset = "singlets",
       mapping = aes(x = "FSC-A", y = "SSC-A")) +
       geom_hex(bins = 128) +
       theme_classic() +
       theme_cowplot()
Supplementary Figure 2: The ungated lymphocyte population shown in FSC-A and SSC-A.

Supplementary Figure 2: The ungated lymphocyte population shown in FSC-A and SSC-A.

Now we can use openCyto to add a computationally defined lymphocyte gate to the data. The gating uses the flowClust software (Finak et al. (2014), Lo et al. (2009)).

# Add a gate using opencyto
add_pop(gs = gs_tcell,
        alias = "lymphocytes",
        pop = "+",
        parent = "singlets",
        dims = "FSC-A,SSC-A",
        gating_method = "flowClust",
        gating_args = "K=3")

#move the CD3+ subtree beneath the lymphocyte gate
for (i in sampleNames(gs_tcell))
  flowWorkspace::moveNode(gh = gs_tcell[[i]],
                          node = "CD3+",
                          to = "lymphocytes")
# recalculate the cell population statistics.
recompute(gs_tcell)

Combining manual analysis with high dimensional clustering

We are not limited to two-dimensional gating. We can leverage other computational approaches to cluster cytometry data. Here we will show how to use FlowSOM (Van Gassen et al. (2015)) to cluster the lymphocyte population we just added to the GatingSet.

# Get the lymphocyte population at the single-cell level.
fs_to_som <- getData(gs_tcell,"lymphocytes")

#set a seed for reproducibility
set.seed(100) 

## From the internals of FlowSOM
fsom <- ReadInput(as.flowSet(fs_to_som), scale = TRUE,transform = FALSE,compensate = FALSE)
fsom <- BuildSOM(fsom)
fsom <- BuildMST(fsom)
maxMeta <- 50 # max metaclusters
cl <- as.factor(MetaClustering(fsom$map$codes, "metaClustering_consensus", maxMeta))
som_result <- list(FlowSOM = fsom, metaclustering = cl)

# These are the cell-level cluster assignments across both samples.
cluster_assignments <- som_result[[2]][som_result[[1]]$map$mapping[,1]]

# Tabulate the number of cells per cluster.
table(cluster_assignments)
## cluster_assignments
##     1     2     3     4     5     6     7     8     9    10    11    12 
## 22719 39875 16199 12489   278  3498   656  1391 14290   621   379   551 
##    13    14 
## 12213   599
# The number of clusters found
nclusters <- length(table(cluster_assignments)) 
# split the cluster assignments across the flow set.
# Returns a list of factors. Length of the list is the number of samples.
# Length of each list element is the number of cells in that sample.
# Contents of each list element is a vector of cell population memberships of each cell in the sample.
cluster_assignments <-
  cluster_split_by_flowSet(clusters = cluster_assignments,
                           fs = fs_to_som)

# name the samples and add to the gating set.
names(cluster_assignments) <- sampleNames(gs_tcell)
# This adds the cluster assignments to the GatingSet
add(gs_tcell,cluster_assignments,parent = "lymphocytes", name = "flowSOM")
## NULL
# ensures counts are accurate
recompute(gs_tcell)

Accessing cell level data

Users can leverage the breadth of flowWorkspace APIs to extract information on single cell data. Below we show a few examples of what can be done.

# get the single cell data that are part of the lymphocyte population we just created.
# returns a flowSet (or disk-backed ncdfFlowSet).
fs <- getData(gs_tcell, "lymphocytes")
# Expression data (here for the first sample) can be fetched by exprs()
head(exprs(fs[[1]]))
##          FSC-A  FSC-H    FSC-W    SSC-A  <B710-A> <R660-A>  <R780-A>
## [1,] 140733.05 133376 69150.98 91113.96 3106.0037 3302.719 2073.3540
## [2,] 127717.88 119616 69974.92 76954.91 1296.8107 1931.995 3769.8376
## [3,] 134347.02 125651 70071.60 70116.48 3128.8455 1834.073 1607.8027
## [4,]  95773.81  91070 68920.97 50826.72  745.4614 1544.511  660.6263
## [5,]  84330.34  77826 71013.20 66052.55 1465.2563 1523.407 3702.2285
## [6,] 101553.95  96602 68895.48 44620.80 2902.9307 2458.440  482.8756
##       <V450-A> <V545-A> <G560-A> <G780-A>  Time
## [1,] 2996.5356 1795.198 2397.642 2860.906 0.002
## [2,] 2749.2837 1660.914 2883.802 3480.662 0.007
## [3,] 2507.7080 1727.784 2194.188 1001.111 0.007
## [4,]  924.5186 2598.948 1354.830 3180.216 0.009
## [5,] 2657.5498 1612.924 2800.220 3415.069 0.015
## [6,] 2700.8479 1561.626 2759.033 3263.364 0.015
# Indices of cells in the lymphocyte population can be extracted (here for the first sample)
# how many cells are in the population?
table(getIndices(gs_tcell[[1]],"lymphocytes")) 
## 
## FALSE  TRUE 
## 56595 62936
# equally:
dim(fs[[1]])
##     events parameters 
##      62936         12
# A vector: TRUE means the cell is in the gate, FALSE means it is outside the gate.
head(getIndices(gs_tcell[[1]],"lymphocytes"),10) 
##  [1]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
# What's the overlap between CD4 T cells and our FlowSOM clusters in the first sample?
# Blot a bar plot of F measure vs cluster id comparing each cluster to the total CD4 T cell gate.
fmeasures <- sapply(1:nclusters,function(x)MLmetrics::F1_Score(getIndices(gs_tcell[[1]],"CD4"), getIndices(gs_tcell[[1]], paste0("flowSOM_",as.character(x)))))

par(mar = c(7,5,2,2))
ggplot(data.frame(
  "F measure" = fmeasures,
  "cluster" = paste0("flowSOM_", 1:nclusters),
  check.names = FALSE
  )) + geom_bar(stat = "_identity") + aes(x = cluster, y = `F measure`) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_x_discrete(limits = paste0("flowSOM_", 1:nclusters))
Supplementary Figure 3: Barplot of the F1 score (F measure) comparing the agreement of cells in the total CD4+ T cell gate against each FlowSOM cluster for sample one of the Cytotrol Lyoplate data.

Supplementary Figure 3: Barplot of the F1 score (F measure) comparing the agreement of cells in the total CD4+ T cell gate against each FlowSOM cluster for sample one of the Cytotrol Lyoplate data.

We can plot the CD4+ T cells along with their gate and ask where cells in the most similar FlowSOM cluster fall in this gate.

# The cluster with the max fmeasure has the most similarity to the total CD4 T cell gate
selected <- paste0("flowSOM_", 1:nclusters)[which.max(fmeasures)]
# Overlay plot showing the CD4 T cell gate on sample 1  and the cells selected by FlowSOM cluster 2 highlighted as black points.
autoplot(gs_tcell[[1]],"CD4") + geom_overlay(as.character(selected), alpha = 0.3)
Supplementary Figure 4: Overlay plot showing the CD4 T cell gate on sample one and the cells selected by FlowSOM cluster 2 highlighted as black points.

Supplementary Figure 4: Overlay plot showing the CD4 T cell gate on sample one and the cells selected by FlowSOM cluster 2 highlighted as black points.

Exporting to FlowJo workspace format.

Finally we’ll use CytoML to export the new modified gating set to a FlowJo XML workspace.

CytoML::GatingSet2flowJo(gs_tcell,
                         outFile = "new_flowjo_xml.xml")
## [1] "new_flowjo_xml.xml"
system("open -a /Applications/FlowJo.app new_flowjo_xml.xml")

Visualizing high dimensional clustering in FlowJo

The view of the high dimensional clustering results exported and visualized in FlowJo is shown below. The clusters are exported as a derived channel in the flowJo workspace file, together with non-standard nodes in the gating tree that describe the clusters. This representation can be viewed in FlowJo, but not on other platforms.

Supplementary Figure 5: Viewing high dimensional clustering results exported from R in FlowJo.

Supplementary Figure 5: Viewing high dimensional clustering results exported from R in FlowJo.

FlowJo representation of clustered data.

Currently, clustering results can only be exported to FlowJo workspaces. This is due to the lack of support for cluster results in Gating-ML. To represent clustering results, a new channel is defined that holds the cluster membership for each cell. This new channel together with new nodes in the gating hierarchy representing each cluster are stored in the workspace output by CytoML and can in turn be read by FlowJo.

Comparison of traditional bivariate gating of lymphocytes in FlowJo and R

img = png::readPNG("new_flowjo_xml-Layout.png")
g <- rasterGrob(img, interpolate = TRUE)
p1 = ggcyto(gs_tcell,
            subset = "singlets",
            mapping = aes(x = "FSC-A", y = "SSC-A")) + 
  geom_hex(bins = 128) + geom_gate("lymphocytes") + 
  geom_stats() + theme(axis.text.x =
            element_text(angle = 45, hjust = 1)) + 
  theme_classic(base_size = 10)
            p1 = as.ggplot(p1)
p1 = ggplotGrob(p1)
g = rasterGrob(img,interpolate = TRUE)

grid.newpage()
## main viewports, I divide the scene in 10 rows ans 5 columns(5 pictures)
pushViewport(plotViewport(margins = c(1,1,1,1),
             layout=grid.layout(nrow=10, ncol=100),
             xscale =c(1,5)))
## I put in the 1:7 rows the plot without axis
## I define my nested viewport then I plot it as a grob.
pushViewport(plotViewport(layout.pos.col=5:95,
                          layout.pos.row=1:4,
             margins = c(1,1,1,1)))
grid.draw(p1)
upViewport()
grid.text("A",0,1)
pushViewport(plotViewport(layout.pos.col=1:100,
                          layout.pos.row=5:10,
             margins = c(1,1,1,1)))
grid.draw(g)
upViewport()
grid.text("B",0 ,0.55)
Figure 3: Lymphocyte gates visualized in openCyto (A) and in FlowJo (B). A workspace of manually gated data was imported into R, modified using openCyto to add the lymphocyte gate, and exported to a new FlowJo workspace using the CytoML package.

Figure 3: Lymphocyte gates visualized in openCyto (A) and in FlowJo (B). A workspace of manually gated data was imported into R, modified using openCyto to add the lymphocyte gate, and exported to a new FlowJo workspace using the CytoML package.

The plots from openCyto (Figure 4A) and the exported FlowJo workspace (Figure 4B) show the same information (and have the same population statistics), with some minor style differences.

Exporting to Cytobank format.

We can take the same gating set and export it to Cytobank format. We need to remove the “flowSOM” clusters from the tree because export of these to Cytobank is not currently supported.

# FlowSOM gates are not supported:
sapply(paste0("flowSOM_",1:nclusters),function(x)Rm(x,gs_tcell))
##                                flowSOM_1 flowSOM_2 flowSOM_3 flowSOM_4
## CytoTrol_CytoTrol_1.fcs_119531 NULL      NULL      NULL      NULL     
## CytoTrol_CytoTrol_2.fcs_115728 NULL      NULL      NULL      NULL     
##                                flowSOM_5 flowSOM_6 flowSOM_7 flowSOM_8
## CytoTrol_CytoTrol_1.fcs_119531 NULL      NULL      NULL      NULL     
## CytoTrol_CytoTrol_2.fcs_115728 NULL      NULL      NULL      NULL     
##                                flowSOM_9 flowSOM_10 flowSOM_11 flowSOM_12
## CytoTrol_CytoTrol_1.fcs_119531 NULL      NULL       NULL       NULL      
## CytoTrol_CytoTrol_2.fcs_115728 NULL      NULL       NULL       NULL      
##                                flowSOM_13 flowSOM_14
## CytoTrol_CytoTrol_1.fcs_119531 NULL       NULL      
## CytoTrol_CytoTrol_2.fcs_115728 NULL       NULL
GatingSet2cytobank(gs_tcell,
                   outFile = "new_cytobank_xml_cytobank_scale.xml",
                   cytobank.default.scale = TRUE)
## [1] "new_cytobank_xml_cytobank_scale.xml"

This gating-ml file is imported into a Cytobank experiment along with the FCS files. The experiment is public and can be accessed at the Cytobank community site. The Cytobank imported gates can be seen below:

Figure 4: Cytobank display of the gating hierarchy for a manually gated Cytotrol T cell sample, with an added openCyto lymphocyte gate.

Figure 4: Cytobank display of the gating hierarchy for a manually gated Cytotrol T cell sample, with an added openCyto lymphocyte gate.

Importing from Cytobank.

Next we will import a Cytobank workspace. For this purpose we will work with an existing public data set on Cytobank. We use the public experiment “Kinase Inhibitor-Treated Reprogramming MEFs” (Zunder et al. (2015)). Paraphrasing from the study abstract, this is a study using mass cytometry to analyze cellular reprogramming at the single-cell level. The authors made simultaneous measurement of markers of pluripotency, differentiation, cell-cycle status, and cellular signaling throughout the reprogramming process, and performed time-resolved progression analysis of the resulting data set to construct a continuous molecular road-map for three independent reprogramming systems.

The URL for this study is (https://community.cytobank.org/cytobank/experiments/43281), with Cytobank ID 43281.

We download the ACS container manually and unzip it to access the contents.

gating_ml_cytobank_file = list.files(
  path = "experiment_43281_Jun-05-2018_11-56-AM",
  pattern = "gate.*xml",
  recursive = TRUE,
  full.names = TRUE
  )
  fcs_files = list.files(
  path = file.path(
  "experiment_43281_Jun-05-2018_11-56-AM",
  "experiments",
  "43281",
  "fcs_files"
  ),
  recursive = TRUE,
  full.names = TRUE
  )
  #Import the Cytobank xml workspace to create a GatingSet.
  gs = cytobank2GatingSet(xml = gating_ml_cytobank_file, FCS = fcs_files)
  
  #Arrange the plots of the 6 samples using the ggcyto package.
  plots = list()
  for (j in 1:length(gs)) {
    p = autoplot(gs[[j]], bins = 128) + theme_classic() 
    plots[[j]] = ggcyto_arrange(p, nrow = 1)
  }
  plot(do.call(gridExtra::gtable_rbind,plots[1:3]))
Figure 5: Samples 1 through 3 from Cytobank experiment 43281

Figure 5: Samples 1 through 3 from Cytobank experiment 43281

cap = c("**Figure 5**: Samples 1 through 3 from Cytobank experiment 43281")

Importing Diva XML

While not shown in the publication, this process is analogous to importing FlowJo XML. We will import a gated FCS file found in the flowWorkspaceData package.

files = list.files(system.file("extdata", 
                               "diva", 
                               package = "flowWorkspaceData"),
                   full.names = TRUE)
ws = openDiva(files[2]) #diva xml
ws
## Diva Workspace Version  Version 6.1.3 
## File location:  /usr/local/R-3.6.0/library/flowWorkspaceData/extdata/diva 
## File name:  PE_2.xml 
## Workspace is open. 
## 
## Groups in Workspace
##                specimen samples
## 1 Compensation Controls       9
## 2                    PE       4
#read in the one FCS file that's present. Suppress warnings that we can't find 3 others.
#PE is the name of the sample group.
gs = suppressWarnings(
  parseWorkspace(ws, 
                 name = "PE",
                 path = dirname(files[1])))
autoplot(gs[[1]]) + 
  theme_classic()
Visualization of the gating hierarchi for a DIVA XML file in the flowWorkspaceData package.

Visualization of the gating hierarchi for a DIVA XML file in the flowWorkspaceData package.

Session Info

sessionInfo()
## R Under development (unstable) (2018-08-14 r75143)
## Platform: x86_64-apple-darwin16.7.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /usr/local/R-3.6.0/lib/libRblas.dylib
## LAPACK: /usr/local/R-3.6.0/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] hexbin_1.27.2             CytoML_1.7.10            
##  [3] FlowSOM_1.13.0            igraph_1.2.2             
##  [5] ggcyto_1.9.12             cowplot_0.9.3            
##  [7] ggplot2_3.0.0             gtable_0.2.0             
##  [9] gridExtra_2.3             Gmisc_1.6.2              
## [11] htmlTable_1.12            Rcpp_0.12.18             
## [13] DiagrammeRsvg_0.1         DiagrammeR_1.0.0         
## [15] MLmetrics_1.1.1           rsvg_1.3                 
## [17] openCyto_1.19.2           webshot_0.5.0            
## [19] png_0.1-7                 flowWorkspace_3.29.7     
## [21] ncdfFlow_2.27.0           BH_1.66.0-1              
## [23] RcppArmadillo_0.9.100.5.0 flowCore_1.47.7          
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_1.3-2            mclust_5.4.1               
##   [3] rprojroot_1.3-2             corpcor_1.6.9              
##   [5] base64enc_0.1-3             clue_0.3-55                
##   [7] rstudioapi_0.7              IDPmisc_1.1.17             
##   [9] mvtnorm_1.0-8               splines_3.6.0              
##  [11] R.methodsS3_1.7.1           mnormt_1.5-5               
##  [13] robustbase_0.93-2           knitr_1.20                 
##  [15] Formula_1.2-3               jsonlite_1.5               
##  [17] cluster_2.0.7-1             R.oo_1.22.0                
##  [19] graph_1.59.0                BiocManager_1.30.1         
##  [21] rrcov_1.4-4                 readr_1.1.1                
##  [23] compiler_3.6.0              httr_1.3.1                 
##  [25] backports_1.1.2             assertthat_0.2.0           
##  [27] Matrix_1.2-14               lazyeval_0.2.1             
##  [29] acepack_1.4.1               visNetwork_2.0.4           
##  [31] htmltools_0.3.6             tools_3.6.0                
##  [33] bindrcpp_0.2.2              glue_1.3.0                 
##  [35] dplyr_0.7.6                 V8_1.5                     
##  [37] Biobase_2.41.2              rgexf_0.15.3               
##  [39] flowUtils_1.45.0            stringr_1.3.1              
##  [41] gtools_3.8.1                devtools_1.13.6            
##  [43] XML_3.98-1.16               DEoptimR_1.0-8             
##  [45] zlibbioc_1.27.0             MASS_7.3-50                
##  [47] scales_1.0.0                hms_0.4.2                  
##  [49] RBGL_1.57.0                 RColorBrewer_1.1-2         
##  [51] yaml_2.2.0                  curl_3.2                   
##  [53] memoise_1.1.0               downloader_0.4             
##  [55] rpart_4.1-13                latticeExtra_0.6-28        
##  [57] stringi_1.2.4               highr_0.7                  
##  [59] Rook_1.1-1                  pcaPP_1.9-73               
##  [61] flowClust_3.19.1            checkmate_1.8.5            
##  [63] BiocGenerics_0.27.1         flowViz_1.45.0             
##  [65] rlang_0.2.2                 pkgconfig_2.0.2            
##  [67] matrixStats_0.54.0          evaluate_0.11              
##  [69] fda_2.4.8                   lattice_0.20-35            
##  [71] purrr_0.2.5                 bindr_0.1.1                
##  [73] labeling_0.3                ks_1.11.3                  
##  [75] htmlwidgets_1.2             tidyselect_0.2.4           
##  [77] plyr_1.8.4                  magrittr_1.5               
##  [79] R6_2.2.2                    Hmisc_4.1-1                
##  [81] RUnit_0.4.32                foreign_0.8-71             
##  [83] pillar_1.3.0                withr_2.1.2                
##  [85] nnet_7.3-12                 survival_2.42-6            
##  [87] forestplot_1.7.2            abind_1.4-5                
##  [89] tsne_0.1-3                  tibble_1.4.2               
##  [91] crayon_1.3.4                KernSmooth_2.23-15         
##  [93] ellipse_0.4.1               rmarkdown_1.10             
##  [95] viridis_0.5.1               data.table_1.11.4          
##  [97] influenceR_0.1.0            git2r_0.23.0               
##  [99] Rgraphviz_2.25.0            ConsensusClusterPlus_1.45.0
## [101] digest_0.6.15               tidyr_0.8.1                
## [103] brew_1.0-6                  R.utils_2.7.0              
## [105] flowStats_3.39.0            stats4_3.6.0               
## [107] munsell_0.5.0               viridisLite_0.3.0

References

Alberer, Martin, Ulrike Gnad-Vogt, Henoch Sangjoon Hong, Keyvan Tadjalli Mehr, Linus Backert, Greg Finak, Raphael Gottardo, et al. 2017. “Safety and Immunogenicity of a mRNA Rabies Vaccine in Healthy Adults: An Open-Label, Non-Randomised, Prospective, First-in-Human Phase 1 Clinical Trial.” Lancet, July.

Brusic, Vladimir, Raphael Gottardo, Steven H Kleinstein, Mark M Davis, and HIPC steering committee. 2014. “Computational Resources for High-Dimensional Immune Analysis from the Human Immunology Project Consortium.” Nat. Biotechnol. 32 (2): 146–48.

Finak, Greg, Jacob Frelinger, Wenxin Jiang, Evan W Newell, John Ramey, Mark M Davis, Spyros A Kalams, Stephen C De Rosa, and Raphael Gottardo. 2014. “OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis.” Edited by Andreas Prlic. PLoS Comput. Biol. 10 (8). Public Library of Science: e1003806.

Finak, Greg, and Mike Jiang. 2011a. A Flow Cytometry Data Package for Testing the Core Bioconductor Cytometry Infrastructure. https://doi.org/doi:10.18129/B9.bioc.flowWorkspaceData.

———. 2011b. FlowWorkspace: Infrastructure for Representing and Interacting with Gated and Ungated Cytometry Data Sets. https://doi.org/doi:10.18129/B9.bioc.flowWorkspace.

Lin, Lin, Greg Finak, Kevin Ushey, Chetan Seshadri, Thomas R Hawn, Nicole Frahm, Thomas J Scriba, et al. 2015. “COMPASS Identifies T-Cell Subsets Correlated with Clinical Outcomes.” Nat. Biotechnol. 33 (6). Nature Publishing Group: 610–16.

Lin, Lin, Jacob Frelinger, Wenxin Jiang, Greg Finak, Chetan Seshadri, Pierre-Alexandre Bart, Giuseppe Pantaleo, Julie McElrath, Steve DeRosa, and Raphael Gottardo. 2015. “Identification and Visualization of Multidimensional Antigen-Specific T-Cell Populations in Polychromatic Cytometry Data.” Cytometry A 87 (7). Wiley Online Library: 675–82.

Lo, Kenneth, Florian Hahne, Ryan R Brinkman, and Raphael Gottardo. 2009. “FlowClust: A Bioconductor Package for Automated Gating of Flow Cytometry Data.” BMC Bioinformatics 10 (1). BioMed Central Ltd: 145.

Rakshit, Srabanti, Vasista Adiga, Soumya Nayak, Pravat Nalini Sahoo, Prabhat Kumar Sharma, Krista E van Meijgaarden, Anto Jesuraj, et al. 2017. “Circulating Mycobacterium Tuberculosis DosR Latency Antigen-Specific, Polyfunctional, Regulatory IL10+ Th17 CD4 T-Cells Differentiate Latent from Active Tuberculosis.” Sci. Rep.

Sauteraud, Renan, Lev Dashevskiy, Greg Finak, and Raphael Gottardo. 2016. “ImmuneSpace: Enabling Integrative Modeling of Human Immunological Data.” The Journal of Immunology 196 (1 Supplement): 124.65–124.65.

Van, Phu, Wenxin Jiang, Raphael Gottardo, and Greg Finak. 2018. “GgCyto: Next Generation Open-Source Visualization Software for Cytometry.” Bioinformatics, June.

Van Gassen, Sofie, Britt Callebaut, Mary J Van Helden, Bart N Lambrecht, Piet Demeester, Tom Dhaene, and Yvan Saeys. 2015. “FlowSOM: Using Self-Organizing Maps for Visualization and Interpretation of Cytometry Data.” Cytometry A 87 (7). Wiley Online Library: 636–45.

Zunder, Eli R, Ernesto Lujan, Yury Goltsev, Marius Wernig, and Garry P Nolan. 2015. “A Continuous Molecular Roadmap to iPSC Reprogramming Through Progression Analysis of Single-Cell Mass Cytometry.” Cell Stem Cell 16 (3): 323–37.